This program simplifies document data extraction by transforming structured Word documents into clean, organized outputs using intelligent pattern matching.
Designed for users who need fast and reliable extraction of key fields from complex documents, it supports uploading Microsoft Word files, identifies important details, and reformats them into a well-structured template.
The interface (accessible via this app link) offers a smooth experience through a web-based or command-line flow, with logging and error tracking built right in.
| File Name | Purpose |
|---|---|
documentExtraction.py |
Core logic for extracting, mapping, and saving structured data from Word files |
autoLogger.py |
Provides logging for outputs, errors, and search functionality |
docUpload.py |
Manages document upload workflows and prevents duplicate files |
flask_main.py |
Hosts a Flask web interface for uploading and processing documents |
default_main.py |
CLI-based coordination of upload, processing, and saving steps |
form_configurations.py |
Stores default and case-specific mapping configurations and regex patterns |
index.html |
User interface for file upload and process initiation |
๐ Intelligent Document Extraction
Extracts structured data and key fields using pattern recognition and section mappings.
๐ Mapping and Formatting
Maps raw content into organized formats using configurable templates.
๐ Logging System
Tracks outputs and errors with a searchable logging interface.
๐ค Document Upload Handling
Prevents duplicate uploads and ensures valid .docx files.
๐งช Web & CLI Interfaces
Offers both Flask-powered web interface and default CLI version.
python-docx for Word file handlingflask_main.pydefault_main.py.docx fileExtracting referral details from social care referral form Word documents and extracting the relevant details to a case study form.